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1.  Introduction 

The  identification  of  cell-type  specific  or  brain-region  specific  transcripts  is  important  to  enable  the  use  of 
peripheral  biomarkers  to  identify  disease  perturbations  in  the  brain  that  can  then  be  located  to  region  and 
cell  type(s)  affected  by  the  disease.  Thus,  the  identification  and  characterization  of  these  cell-type  and 
brain  region  specific  transcripts  will  be  of  high  utility  for  the  eventual  diagnosis  of  a  wide  variety  of  brain 
diseases  and  brain  trauma,  such  as  traumatic  exposure  to  explosions  of  soldiers  in  war.  In  this  year,  we 
developed  the  novel  classification  method  -‘‘Eigen-Brain”  approach-  to  identify  the  new  candidate  cell- 
type  specific  transcripts  and  the  biological  interpretations  have  been  done  for  the  results.  In  addition, 
region-based  clustering  for  spatial  expression  patterns  of  the  cell-type  specific  genes  has  been  applied  and 
it  revealed  unique  anatomic  expression  patterns  from  each  cell-type  specific  genes.  We  also  focused  in  a 
deep  analysis  of  the  brain  transcriptome  using  high  throughput  sequencing  technologies. 

2.  Body 

2.1.  Brain  Image  Preprocessing 

Our  first  task  was  to  preprocess  in  situ  hybridization  (ISH)  brain  images  which  were  provided  by 
Allen  Brain  Atlas  (ABA)  in  order  to  remove  the  side  effects  such  as  noise,  background,  and 
inconsistent  orientation  etc.  To  deal  with  these  issues,  we  applied  three  preprocessing  steps:  Brain 
Extraction,  Noise  Image  Removal,  and  Image  Registration  and  this  preprocessing  step  helps  us  to 
have  robust  ISH  image  training  set  which  can  be  used  to  identify  cell-type  specific  genes. 

2.1.1.  Brain  Extraction 

In  most  of  brain  images,  mouse  brain  itself  has  been  located  in  the  center  of  images  by  the 
automatic  image  capture  system  of  the  ABA.  However,  they  have  different  margin  background, 
and  it  need  to  be  removed.  For  the  brain  extraction,  we  remove  rows  or  columns  that  don't  include 
any  information  in  terms  of  pixels. 

2.1.2.  Noise  Image  Removal 

There  are  some  expression  images  that  barely  have  expression  pattern.  It  is  not  possible  to  use 
these  images  as  training  set  for  an  effective  learning.  Thus,  the  ISH  images  only  having 
expression  patterns  with  less  than  5%  of  area  and  less  than  10%  of  density  are  removed. 

2.1.3.  Image  Registration 

The  ISH  brain  images  could  have  been  located  in  different  position  or  orientation.  For  an  exact 
and  fair  comparison,  we  apply  image  registration  method  proposed  by  [1]  to  align  the  images  into 
the  same  position  and  orientation.  This  method  considers  a  subset  of  affine  transformation  in 


3 


which  straight  lines  remain  straight  without  using  any  curvature  or  perspective  distortion.  Affine 
transformation  is  called  as  a  linear  transformation  with  operations  such  as  shifts,  rotations,  and 
scaling.  For  the  image  registration,  we  perform  the  only  coarse  registration  step  from  the  method 
[2].  This  step  allows  us  to  align  the  ISH  brain  images  with  the  reference  image  (Figure  1  and 
Figure  2).  However,  since  many  ISH  gene  expression  images  hardly  have  recognizable  pattern 
and  in  this  case,  it  is  hard  to  compare  these  expression  images  with  reference  images.  Thus,  we 
apply  two  stage  image  registration  procedures.  In  the  first  stage,  we  have  used  the  original  ISH 
images  for  image  registration  and  find  the  optimal  parameters  for  shift,  rotation,  and  scaling 
operations,  which  maximize  the  entropy  and  mutual  information  between  original  brain  image 
and  reference  image.  Once  we  find  the  optimal  parameters  for  the  image  registration,  we  perform 
the  same  transformation  into  the  ISH  expression  images  (bottom  left  and  right).  Figure  1  and 
Figure  2  shows  the  original  brain  image  before  image  registration  (top  left)  and  after  image 
registration  (top  right)  for  coronal  section  and  sagittal  section. 


Figure  1.  Image  registration  example  for  the  coronal  section 
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Figure  2.  Image  registration  example  for  sagittal  section  (Center  image  shows  the  procedure  to  align  the  original 

image  with  the  reference  image) 

2.2.  Feature  (i.e.  intensity  and  density)  extraction 


To  classify  the  brain  expression  images  of  cell-type  specific  genes,  we  start  to  divide  each  brain 
expression  image  with  fixed  number  of  patch  N  (e.g.  N  =  100)  and  extract  the  two  representative 
features:  intensity  (or  brightness)  and  density  per  each  patch. 


^  brightnesj^p) 


intensitypatch 


 pePatch 


\PatcI^ 


densitypatch  = 


Z‘(P) 

p&Patch 

\Patc}\ 


where  I(p)  is  an  indicator  function  which  has  1  if  p  has  an  expression  value,  otherwise  0. 
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Figure  3.  The  original  and  recovered  images  with  a  summarized  intensity  feature  vector  (upper)  and  the  distribution 
of  intensity  feature  over  patches  in  brain  expression  image  (bottom). 


Figure  4.  The  original  and  recovered  images  with  a  summarized  density  feature  vector  (upper)  and  the  distribution 
of  density  feature  over  patches  in  brain  expression  image  (bottom). 

The  distribution  of  intensity  and  density  features  over  patehes  in  brain  expression  image  has  been 
shown  in  Figure  3  and  Figure  4.  Reeovered  image  with  a  summarized  feature  veetor  (i.e.  intensity 
and  density)  eonfirmed  that  extraeted  feature  veetor  represent  the  original  image  enough  well. 
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2,3-  Training  set  for  cell-type  specific  gene  classification 

We  compiled  three  different  cell-type  specific  gene  lists  such  as  oligodendrocytes-enriched  genes, 
astrocytes-enriched  genes,  and  neuron  enriched  gene  from  literature  [3].  There  are  two  different  brain 
images  depending  on  cutting  section  of  brain:  coronal  section  and  sagittal  section.  Table  1  shows  the 
number  of  training  set  per  each  cell  class.  Some  noisy  images  have  been  removed  through  the 
following  preprocessing  step. 


Table  1.  The  number  of  training  set  per  cell  class 


Coronal  Section 

Sagittal  Section 

Oligodendrocytes 

75 

182 

Astrocytes 

95 

231 

Neurons 

338 

496 

Total 

508 

909 

2,4,  Cell-type  specific  gene  classification  with  SVM  (support  vector 
machine) 

To  validate  the  usefulness  of  extracted  features  (e.g.  intensity  and  density),  we  first  performed  a 
classification  experiment.  We  applied  a  standard  SVM  (i.e.  libsvm  package  [4])  for  multi-class 
classification.  We  experimented  with  various  kernels  including  the  linear  kernel,  polynomial  kernel, 
radius  kernel,  and  sigmoid  kernels  and  parameters  for  each  kernel  in  SVM  are  optimized  using  a  10- 
fold  cross  validation.  We  compared  the  classification  result  with  different  kernels  and  different  brain 
sections  (Table  2  and  Table  3). 


Table  2.  10-fold  cross  validation  accouracy  over  different  kernel  and  different  brain  sections 


CV  accuracy 

Linear  Kernel 

Polynomial  Kernel 

Radius  Kernel 

Sigmoid  Kernel 

Coronal  section 

78.3465% 

67.8161% 

66.6667% 

65.5172% 

Sagittal  section 

70.7371% 

65.7866% 

60.176% 

42.4642% 

Table  3.  Sensitivity,  specificity,  and  precision  for  coronal  section  and  sagittal  section  with  SVM 


Coronal 

Sagittal 

Oligodendrocytes 

Astrocytes 

Neuron 

Oligodendrocytes 

Astrocytes 

Neuron 

Sensitivity 

63% 

53% 

89% 

57% 

71% 

76% 

Specificity 

95% 

92% 

65% 

91% 

81% 

82% 

Precision 

70% 

62% 

84% 

62% 

56% 

83% 

2,5-  Eigen-Brain  Approach 

In  this  approach,  we  decompose  brain  expression  images  into  a  small  set  of  characteristic  brain 
expression  image  patterns  called  Eigen-Brain"' .  This  technique  has  been  firstly  introduced  for  face 
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recognition  in  [2].  Eigen-Brain  can  be  thought  as  a  visualization  or  ghost  image  of  principal 
components  over  training  set  and  thus,  these  Eigen-Brains  explain  the  variation  of  brain  expression 
pattern  of  our  brain  images  and  represent  the  major  characteristics  of  expression  image  patterns. 
Each  individual  brain  image  can  be  represented  as  a  linear  combination  of  the  Eigen-Brains.  These 
Eigen-Brains  are  calculated  using  our  training  set,  keeping  only  M  Eigen-Brains  that  correspond  to 
the  highest  eigen  values.  The  detail  of  method  will  be  explained  in  a  following  section. 

These  M  Eigen-brains  define  the  new  feature  space  and  we  project  all  brain  expression  images  in  our 
training  set  into  this  new  feature  space  (e.g.  Eigen-Brain  space).  It  gives  not  only  large  dimension 
reduction  benefits  but  also  clear  expression  pattern  for  cell  type  specific  genes,  which  will  be  further 
helpful  to  characterize  and  define  the  cell-type  specific  expression  patterns.  After  projecting  into  the 
new  feature  space,  the  weight  vector  for  each  image  is  calculated,  which  describes  the  contribution  of 
each  Eigen-Brain  for  the  representation  of  the  image.  It  also  can  be  thought  as  new  coordinates  of 
image  in  Eigen-Brain  space.  Once  we  have  these  weight  vectors,  they  will  be  used  to  classify  the 
unknown  expression  image  by  comparing  these  weight  vectors. 

2.5.1.  Method 


Let  the  brain  expression  image  of  training  set  rj,r2,...f"^  .  The  average  brain  image  of  the 

I  ^ 

training  set  is  4^  =  —  xr.  and  each  brain  image  is  different  from  the  average  image  by 
—  4^ .  The  eigen  vector  for  covariance  matrix  C  of  our  training  set  is  calculated  by 

(1). 


<i) 

A  n^\ 

Once  eigen  vectors  are  calculated,  M  Eigen-Brains  are  used  to  define  the  new  feature  space.  All 
images  are  transformed  into  this  ''Eigen-Brain  space"  (2). 

o\=  jL^(r  —  ,  where  (2) 

Thus,  the  weight  vector  ( )  for  new  image  ( E )  is  =  [Wj ,  W2  ]  and  this  weight  vector 

is  compared  with  weight  vectors  in  training  set.  The  decision  for  the  cell  type  specific  gene 
classification  is  made  by  finding  the  most  similar  gene  expression  image's  class  (cell)  label  by 
borrowing  a  KNN  concept.  For  this  experiment,  we  have  applied  unanimity  vote  scheme,  which 
guarantee  that  the  new  data  set  are  classified  with  high  confidence  and  it  only  returns  very 
accurate  cell-type  specific  gene. 

2.5.2.  Classification  Result 

We  applied  the  Eigen-Brain  approach  into  our  training  set  and  the  accuracy  and  specificity  for  the 
sagittal  section  has  been  reported  in  Figure  5  and  Figure  6.  The  result  for  coronal  section  has  been 
reported  in  Figure  7  and  Figure  8.  The  x-axis  in  Figures  represents  the  degree  of  variance  covered 
by  Eigen-Brains  and  the  larger  number  of  variance  means  that  we  are  using  many  Eigen-Brains 
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as  a  new  feature  spaee.  It  shows  that  the  degree  of  varianee  or  number  of  Eigen-Brains  doesn't 
affeet  the  elassifieation  aeeuraey.  Moreover,  as  we  ean  see,  our  Eigen-Brain  approaeh  aehieves 
very  high  aeeuraey  and  high  speeifieity  that  represents  that  our  approaeh  ean  elassify  the  eell-type 
speeifie  gene  with  high  eonfidenee.  Even  though  the  speeifieity  for  neuron  eell-type  speeifie 
gene  in  eoronal  seetion  is  relatively  less  than  others,  it  is  still  high  with  80%  speeifieity  and  it  ean 
be  eovered  by  high  aeeuraey.  In  all  other  eases,  it  shows  more  than  95%  speeifieity  and  around 
99%  aeeuraey.  Espeeially,  these  high  speeifieities  guarantee  the  low  false  positive  rate  and  it 
allows  us  to  identify  real  eell-type  speeifie  genes. 


■  olig_accuracy 

■  astro_accuracy 

■  neuron_accuracy 


Figure  5.  Accuracy  per  cell  class  classification  in  sagittal  section 


Figure  6.  Specificity  for  cell  class  classification  in  sagittal  section 


■  olig_accuracy 

■  astro_accuracy 

■  neuron_accuracy 

Figure  7.  Accuracy  per  cell  class  classification  in  coronal  section 
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Figure  8.  Specificity  for  cell  class  classification  in  coronal  section 

2.5.3.  EigenBrains  per  each  cell  type 

In  each  Eigen-Brain  images,  highly  expressed  brain  regions  are  appeared  as  Figure  9  illustrates 
the  Eigen-Brains  for  neuron  cell-type  specific  genes  in  coronal  section.  The  red  circle  region  in 
Figure  9  represents  the  VL  lateral  ventricle  region  that  is  related  with  neurological  condition  and 
is  on  average  larger  in  patients  with  schizophrenia  and  bipolar  disorder.  The  blue  circle  region 
represents  the  cerebral  cortex  (CTX)  region  that  is  a  sheet  of  neural  tissue  that  is  outermost  to  the 
cerebrum  of  the  mammalian  brain  and  takes  a  key  role  in  memory,  attention,  perceptual 
awareness,  thought,  language,  and  consciousness. 


o 

O: 

\%t:y 

nr 

t  \  : 

y 

1 

1 

I 

1 

Figure  9.  Eigen-Brains  for  neuron  specific  genes  in  coronal  section 

The  red  circle  region  in  Figure  10  represents  the  optic  chiasm  (och)  region  that  allows  for  the  right  visual 
field  to  be  processed.  Similar  analysis  can  be  applied  to  other  Eigen-brains  to  identify  the  specific  region 
especially  responsible  to  characterize  the  specific  cell  type. 
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Figure  10.  Eigen-Brains  for  astrocytes  specific  genes  in  coronal  section 

Black  circle  in  Figure  11  illustrates  the  nucleus  of  the  lateral  olfactory  (NLOT)  region,  which  is  highly 
expressed  in  oligodendrocytes  specific  genes. 


Figure  11.  Eigen-Brains  for  oligodendrocytes  specific  genes  in  coronal  section 
Figure  12,  Figure  13,  and  Figure  14  represent  the  Eigen-Brains  per  each  cell  type  in  sagittal  section. 
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Figure  13.  Eigen-Brains  for  astrocytes  specific  genes  in  sagittal  section 
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Figure  14.  Eigen-Brains  for  oligodendrocytes  specific  genes  in  sagittal  section 


2,6-  Test  data  set  for  cell-type  specific  gene  identification 

We  downloaded  all  the  original  mouse  brain  images  and  their  eorresponding  gene  expression  images 
using  a  Perl  seript,  whieh  has  been  written  during  our  first  quarter  of  last  year.  The  seript  parsed  the 
XML  files  to  extraet  the  assoeiated  image  file  information  and  download  the  eorrespondent  image. 
Table  4  shows  the  statisties  of  test  data  set  that  will  be  used  for  our  eell  type  speeifie  gene 
identifieation.  In  average,  eaeh  gene  has  about  2-3  ISH  brain  images  from  the  seetion  we  are  foeusing 
on.  Thus,  the  number  of  genes  in  our  test  data  set  is  less  than  the  total  number  of  ISH  brain  images. 


Table  4.  Statistics  of  Test  data  set 


#  of  image  files 

#  of  genes 

Total  size 

Coronal  Seetion 

7341 

-3600 

27  Gigabyte 

Sagittal  Seetion 

29112 

-12000 

145  Gigabyte 

2.6.1.  Image  registration  for  Test  data  set 

The  ISH  brain  images  have  not  been  aligned  in  the  same  position  or  orientation.  In  addition,  the 
image  size  itself  is  also  various.  Thus,  for  a  fair  eomparison,  alignment  of  the  orientation  and 
adjustment  of  the  size  of  images  are  neeessary.  Here,  we  apply  the  image  registration  teehnique 
[1]  into  the  in  situ  hybridization  (ISH)  brain  images  that  were  provided  by  Allen  Brain  Atlas 
(ABA)  in  order  to  remove  the  side  effeets  sueh  as  noise,  baekground,  and  ineonsistent  orientation 
ete.  This  method  eonsiders  a  subset  of  affine  transformation  in  whieh  straight  lines  remain 
straight  without  using  any  eurvature  or  perspeetive  distortion.  Affine  transformation  is  ealled  as  a 
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linear  transformation  with  operations  sueh  as  shifts,  rotations,  and  sealing.  We  apply  two  stage 
image  registration  proeedures  to  transform  the  ISH  expression  images.  In  the  first  stage,  the 
original  ISH  images  have  been  used  for  image  registration  and  the  optimal  parameters  for  shift, 
rotation,  and  sealing  operations  have  been  estimated,  whieh  maximize  the  entropy  and  mutual 
information  between  original  brain  image  and  referenee  image.  In  the  seeond  stage,  these 
estimated  optimal  parameters  have  been  used  to  transform  the  ISH  expression  images. 

For  handling  the  test  dataset,  new  ehallenging  problem  is  oeeurred  by  the  huge  number  of  images 
and  their  size  (Table  4).  In  average,  image  registration  proeess  takes  about  4  minutes  to  eomplete 
one  image  registration,  whieh  means  that  it  would  take  more  than  3  months  to  finish  this  image 
registration  proeess  for  our  test  data  set  if  this  proeess  is  applied  sequentially.  Thus,  we  have  used 
our  eluster  maehine  eomposed  of  10  nodes  with  38  CPUs,  whieh  allows  us  to  finish  this  proeess 
in  two  weeks. 

2.6.2.  Classification  using  EigenBrain  approach 

In  this  project,  we  have  proposed  a  new  approach  to  identify  the  new  candidate  cell  type  specific 
genes  using  what  we  call  the  EigenBrains.  Here,  we  briefly  summarize  the  overall  procedure 
(Figure  15).  First,  we  applied  an  image  registration  technique  to  align  all  the  brain  expression 
images.  After  this  image  registration  then  the  specific  features  (e.g.  density  per  each  patch)  are 
extracted.  This  procedure  results  in  very  high-dimensional  data.  Thus,  to  reduce  data 
dimensionality,  we  applied  our  EigenBrain  approach  that  transforms  the  original  feature  vector 
into  the  new  EigenBrain  space  (projected  into  low  dimensional  brain  space).  This  transformation 
helps  to  reduce  the  high  dimensional  feature  space  into  the  low  dimensional  feature  space  -  and  is 
shown  to  greatly  improve  the  accuracy  of  our  algorithm  to  successfully  identify  cell-type  specific 
transcripts. 


Cell  type  3 


Figure  15:  Overall  procedure  for  classification 

This  reduced  new  feature  set  is  made  more  biologically  meaningful  by  representing  the  regions 
instead  of  independent  patches  (Figure  16).  Once  the  original  features  were  transformed  into  the  new 
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feature  veetor,  we  applied  a  K-neighest  neighbors  (KNN)  algorithm  with  unanimous  vote  seheme  to 
identify  the  eandidate  eell-type  speeifie  genes. 


(0.2,  0.82,  0.42,  ...,0.61) 

High  dimensional  feature  vector 


fO.43,  0.23,  0.1,  ...,0.32) 

- 

Low  dimensional  feature  vector 


Figure  16:  Transformation  of  original  feature  into  the  EigenBrain  space 


2,7,  Region  based  clustering  for  different  cell-type  specific  genes 

To  characterize  the  cell-type  specific  genes,  we  applied  region-based  clustering  to  the  spatial  gene 
expression.  For  computational  efficiency,  we  first  reduced  the  original  in  situ  hybridization  images 
into  300x300  pixels  using  bicubic  interpolation,  the  output  is  a  weighted  average  of  pixels  in  the 
nearest  4x4  neighborhood.  Thus,  each  pixel  represents  the  averaged  expression  values  of 
approximately  600  pixels  on  a  coronal  section  and  1100  pixels  on  a  sagittal  section.  Then,  we  applied 
the  K-means  clustering  algorithm  to  group  summarized  pixels  based  on  their  expressions  across  all 
cell-type  specific  genes  within  same  cell  specificity  and  the  results  for  cell-type  specific  gene 
expression  are  reported. 


Table  5:  Region-based  clustering  based  on  coronal  section  of  brain  images 


Number 

of 

Cluster 

Neuron 

Oligodendrocytes 

Astrocytes 

K=3 

n 

K=4 

E3 

15 


2,8-  The  discovery  of  brain  anatomical  region  on  coronal  and  sagittal 
sections 

Table  5  shows  the  region-based  elustering  results  on  the  eoronal  seetion  for  eell-type  speeifie  genes  of 
three  different  eells  (i.e.  neuron,  astroeytes,  oligodendroeytes).  The  most  intriguing  eharaeteristie  of 
this  spatial  gene  expression  patterns  was  diseovered  from  neuron  eell-type  speeifie  genes  (left  eolumn 
in  Table  1).  As  ean  be  seen,  the  region-based  elustering  result  reveals  dramatieally  elear  patterns 
from  neuron  eell-type  speeifie  genes,  eonsistent  with  elassieal  anatomie  brain  strueture.  In  partieular, 
by  inereasing  the  number  of  elusters,  the  new  anatomie  brain  regions  are  detaehed  from  previous 
elustering  results.  As  shown  in  the  left  eolumn  of  Table  5,  eerebral  eortex  (CTX)  region  ealled  gray 
matter  is  distinetly  separated  from  white  matter  region  in  the  brain  at  the  beginning.  As  K  is  inereased 
to  8,  the  eaudoputamen  (CP),  Globus  pallidus  internal  and  external  segment  (GPi,  GPe),  inner 
eerebral  eortex  (CTX),  piriform  eortex  (PIR),  eorpus  eallosum  (CC),  fimbria  (Fi),  and  thalamus  (TH) 
ete  are  elearly  separated  from  white  matter  region  of  the  brain  in  the  order  named.  This  reveals 
overall  eoneordanee  between  spatial  gene  expression  patterns  with  the  anatomieal  regions  of  mouse 
brain.  For  example,  after  the  separation  of  the  eerebral  eortex  region,  the  eaudoputamen  region 
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distinctly  comes  out.  This  caudoputamen  region  is  known  to  be  related  with  neurogenesis  process 
whose  role  is  important  for  learning  and  memory.  Thus,  the  clear  separation  of  this  brain  region  in 
neuron  cell-type  specific  genes  can  be  explained  by  its  functional  connection  with  neuron  cells. 
However,  such  a  clear  anatomical  pattern  could  not  be  detected  for  other  cell  types  (e.g. 
oligodendrocytes  and  astrocytes)  and  instead,  more  lousy  patterns  are  observed  comparing  to  the 
neuron  cell-type  specific  genes. 

Table  6  represents  the  similar  anatomical  structure  from  the  region-based  clustering  of  spatial  gene 
expressions  on  a  sagittal  section.  In  particular,  the  cerebral  cortex  (CTX)  and  anterior  olfactory 
nucleus  (AON)  are  separated  very  clearly.  In  addition,  the  dentate  gyrus  (DG),  the  part  of  the 
hippocampal  formation  (HP),  also  comes  out  distinctly,  which  is  implicated  in  new  memory.  This 
region  is  known  to  be  related  with  high  rates  of  neurogenesis  in  adult  human  [5].  Such  a  clear 
separation  of  this  anatomical  brain  region  is  revealed  especially  in  the  neuron  specific  genes  on  both 
sagittal  section  and  coronal  section.  Furthermore,  astrocyte  specific  genes  show  striking  region- 
specific  expression  pattern  in  the  cerebellar  cortex  (CBX)  region. 


Table  6:  Region-based  clustering  for  sagittal  section  brain  images 


Number 

of 

Cluster 


Neuron 


Oligodendrocytes 


Astrocytes 
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2.9-  Identification  of  uniquely  co-regulated  brain  regions  for  cell  type 

Another  interesting  observation  we  found  is  the  diseovery  of  eo-expressed  brain  anatomieal  regions 
per  eaeh  eell  type.  At  K=8,  the  eaudoputamen  (CP)  region  is  elustered  together  with  the 
paraventrieular  nueleus  of  the  thalamus  (PVT)  region  on  both  oligodendroeytes  and  astroeytes  eell- 
speeifie  genes  (marked  with  white  arrow  in  Table  5).  The  eolumns  of  the  fornix  (Fx)  region  are  also 
elustered  with  the  bed  nuelei  of  the  stria  terminallis  (BST)  region  in  oligodendroeyte  speeifie  genes 
(marked  with  white  arrow  in  Table  5).  For  neuron  speeifie  genes,  the  lateral  septal  nueleus  (Lse), 
eaudal  or  eaudodorsal  part,  triangular  nueleus  of  septum  (TRS),  and  septofimbrial  nueleus  (SF) 
regions  have  eo-expressed  together  with  the  glubus  pallidus  external  segment  (GPe)  region. 
Furthermore,  unlike  other  eells,  the  lateral  ventriele  (VL)  region  never  expressed  for  neuron  eell-type 
speeifie  genes  aeross  all  Ks  on  both  brain  seetions.  This  VL  region  is  known  to  be  inereased  with  age 
and  enlarged  in  a  number  of  neurologieal  eonditions.  Furthermore,  this  region  is  usually  larger  for 
sehizophrenia,  bipolar  disorder,  and  Alzheimer's  disease  patients  than  normal  people.  Regardless  of 
K  values,  there  are  several  eo-expressed  anatomieal  regions  that  are  eo-expressed  on  the  sagittal 
seetion  too  (Table  6).  For  example,  the  eerebral  eortex  (CTX)  and  anterior  olfaetory  nueleus  (AON), 
eaudoputamen  (CP)  and  nueleus  aeeumbens  (ACB)  show  similar  expression  patterns  elustered 
together  for  neuron  eell  type  speeifie  genes. 

2.10-  Correlation  Matrix  for  Region  based  clustering 

Figure  17  demonstrates  the  eorrelation  matrix  between  left  and  right  hemispheres.  The  left  bottom 
eorner  represents  the  eorrelation  of  eenter  position  of  two  hemispheres. 
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Figure  17:  Correlation  Matrix  between  left  and  right  hemisphere 

Table  7  shows  the  eorrelation  matrix  between  left  hemisphere  and  right  hemisphere  for  region- 
based  elustering  results  on  eoronal  seetions  (Table  5).  The  diagonal  line  is  mapped  to  the  exaet 
symmetry  position  in  the  both  hemispheres.  In  the  human  brain,  unique  funetions  seem  to  be 
eontrolled  in  the  left  or  right  hemisphere.  For  example,  language  ability  is  usually  predominated 
by  a  left  hemisphere,  while  spatial  reeognition  is  eontrolled  by  a  right  hemisphere  [6].  However, 
this  faet  is  somewhat  eounter-intuitive  eompared  to  what  we  observed  from  ISH  spatial 
expressions.  Based  on  the  gene  expression  or  eell  distribution,  we  ean  diseover  that  there  is  no 
elear  differenee  between  left  and  right  hemisphere  at  the  perspeetive  of  spatial  gene  expression. 
Furthermore,  sueh  a  elear  symmetrie  pattern  is  identified  more  elearly  in  neuron  eell-type  speeifie 
genes  (the  first  eolumn  of  Table  7).  Unlike  other  eell-type  speeifie  genes,  diagonal  line  in 
eorrelation  matrix  of  the  neuron  eell-type  speeifie  genes  is  marked  as  a  red  eolor,  whieh  means 
the  elear  symmetry  between  left  and  right  hemisphere  regardless  of  K  values.  Espeeially, 
oligodendroeyte  speeifie  genes  show  the  least  symmetrie  pattern  eomparing  to  other  eell  types. 
Even  though  there  are  many  researeh  have  been  done  to  diseuss  the  asymmetry  of  brain  funetions 
and  expression  (Sun  et  ah,  2005),  they  also  reported  that  sueh  an  asymmetry  is  only  deteeted  at 
earlier  stage  in  the  fetal  brain  and  elear  left -right  expression  differenee  is  diminished  at  the  latest 
stage  of  brain  (19-week  -old  human  brain)  [7].  Sinee  Allen  Brain  Atlas  (ABA)  in  situ 
hybridization  images  are  obtained  from  56  days  adult  mouse  brain,  sueh  an  asymmetrie 
expression  pattern  eannot  be  deteeted  unlike  a  fetal  brain.  It  is  supported  by  investigating  the  in 
situ  hybridization  images  of  genes  known  for  asymmetrieally  expressed  genes  [7]  in  Figure  18. 


Table  7:  Correlation  Matrix  between  left  and  right  hemisphere 


Number 

of 

Cluster 

Neuron 

Oligodendroeytes 

Astroeytes 

19 


20 


Figure  18  represents  the  in  situ  hybridization  images  of  representative  genes  known  for 
asymmetrieally  expressed  gene  by  [6].  Three  genes  (i.e.  BAIAP2,  NEUROD6,  SH3GL2)  in  the 
left  eolumn  of  Figure  18  were  known  as  highly  expressed  ones  in  the  left  hemisphere,  while 
IGFBP5,  LM04  and  STMN4  (right  eolumn  of  Figure  19)  were  verified  as  differentially 
expressed  ones  in  the  right  hemisphere  of  12-week-old  human  fetal  brains  through  either  real¬ 
time  reverse  transeription  (RT)  -  PCR  or  in  situ  hybridization  [6].  However,  as  we  ean  see  from 
Figure  18,  those  genes  didn't  reveal  elear  asymmetrieal  expression  pattern  from  in  situ 
hybridization  images  from  ABA.  As  we  pointed  out,  sueh  an  asymmetry  elearly  tends  to  appear 
in  early  fatal  brains.  However,  it  may  not  neeessarily  be  present  for  adult  brains.  Furthermore,  the 
other  reason  is  that  the  subtle  expression  differenee  between  left  and  right  hemispheres  appeared 
only  in  small  number  of  genes  ean  be  faded  away  beeause  we  are  averaging  the  expression 
patterns  of  all  eell-type  speeifie  genes  for  region-based  elustering. 
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Figure  18:  in  situ  hybridization  images  known  for  asymmetrically  expressed  gene 

In  order  to  compare  the  degree  of  correlation  in  different  cell-type  specific  genes,  we  calculate  the 
Pearson  correlation  between  left  and  right  hemisphere  (diagonal  line  in  Table  7)  and  Figure  19  shows 
correlation  across  all  K  values.  As  shown,  neuron  specific  genes  reveal  prominent  symmetric 
expression  patterns,  while  oligodendrocyte  specific  genes  show  the  least  symmetric  expression 
pattern. 

2,11,  T-test  to  identify  differentially  expressed  regions  for  each  cell  type 

To  investigate  the  highly/lowly  expressed  brain  regions  for  an  each  cell  type,  we  applied  unpaired  t- 
test  between  different  cell-type  specific  genes  at  the  different  levels:  intensity  and  density.  At  the 
gene  expression  intensity  level,  we  reduced  the  original  in  situ  hybridization  images  into  300x300 
pixels  using  bicubic  interpolation  as  we  did  for  K-means  clustering  and  applied  quantile 
normalization.  Then,  t-test  has  been  applied  to  identify  the  highly  differentially  expressed  region  (at  a 
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Figure  19:  Correlation  between  left  and  right  hemispheres 

P  value  <  0.05).  The  unique  anatomic  brain  region  that  is  highly  expressed  for  specific  cell  could  be 
discovered  using  this  test.  In  addition,  the  similar  analysis  has  been  done  at  the  density  level.  For  this 
test,  original  images  have  been  divided  by  100x100  patches  and  each  patch  value  is  represented  as  a 
density  value  of  this  patch,  which  shows  how  many  pixels  in  this  patch  are  expressed.  Density  level 
based  t-test  helps  us  to  identify  the  highly  differentially  expressed  anatomical  brain  regions  and  this 
result  demonstrates  the  relation  of  cell  distribution  in  the  brain. 
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2,12,  Microarray  analysis 


We  have  studied  the  gene  expression  in  human  and  mouse  “atlas”  from  the  GNF  data  sets  (NCBI  GEO 
GSEl  133),  and  we  expanded  our  knowledge  with  other  mieroarray  studies,  (Table  8). 


Table  8.  Microarray  datasets  utilized 


NCBI  GEO 

Specie 

#  Tissues 

Reference 

GSEl 133 

Human 

79 

[8] 

GSE3526 

Human 

65 

NA 

GSE2361 

Human 

36 

[9] 

GSEl 133 

Mouse 

61 

[8] 

GSE9954 

Mouse 

22 

[10] 

GSE10246 

Mouse 

96 

[11] 

All  the  datasets  seleeted  are  publie  for  aeademie  usage.  Few  samples  are  shared,  beeause  eaeh  author 
seleet  with  different  eriteria  in  the  sampling.  We  basieally  look  for  a  genomie  ehip  (Affymetrix  in 
partieular)  and  one  or  more  brain-related  sample  in  the  dataset.  After  raw  data  (CEL  files)  reading, 
baekground  eorreetion  and  RMA  normalization  in  R/bioeonduetor:affy,  we  eolleeted  the  expression 
patterns  of  almost  all  the  genes  in  the  human  and  mouse  genome  in  a  loeal  database.  Then  we  used  the 
same  eriteria  to  seleet  brain-speeifie  genes,  this  is  10-fold  enriehment  in  one  brain  sample  eontrasted  with 
the  maximal  expression  level  deteeted  in  the  rest  of  non-brain  samples.  Also  the  probe  must  have  a  P- 
value  <  0.05.  We  proeessed  eaeh  dataset  separately  to  reduee  noise. 

We  eompared  the  genes  seleeted  in  eaeh  dataset,  but  we  don't  see  too  mueh  overlap  (Figure  20). 


Human  Mouse 


Figure  20.  Brain-specific  genes  overlap  between  microarray  experiments. 


We  have  already  predieted  a  “seeretable”  probability  using  Signal?  program  [12],  but  we  noted  that  many 
isoforms  of  a  gene  have  quite  different  probabilities  to  be  seereted.  We  are  now  integrating  RNA-seq  data 
to  undereover  the  isoforms  expressed  in  brain  samples. 
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2,13-  RNA-seq  analysis 


Other  exploration  in  the  brain-speeifie  genes  was  to  inelude  RNA-seq  data.  Our  interest  is  to  diseover  the 
transeript-speeifie  patterns  in  brain,  as  we  mentioned  before,  we  used  the  datsets  from  NCBI  GEO 
GSE 12946  and  GSE13652  [13,14],  both  in  eombination  measure  the  mRNA  levels  for  12  human  tissues. 
We  mapped  and  aligned  the  reads  for  eaeh  dataset  to  the  human  referenee  genome  and  the  gene  models 
(hgl8  of  UCSC  Genome  DB)  with  Bowtie  [15],  a  short-read  aligner  (Table  9). 

Table  9.  human  reference  genome  and  the  gene  models 


GEO 

GSM325476 

description 

brain 

total  reads 

17,246,957 

%  unique 

57.35% 

%  genome 

62.62% 

%  genes 

18.68% 

GSM325477 

liver 

18,517,121 

45.60% 

57.05% 

15.53% 

GSM325478 

heart 

20,169,301 

41.81% 

55.04% 

21.66% 

GSM325479 

skeletal  muscle 

22,640,454 

46.70% 

60.60% 

20.93% 

GSM325480 

colon 

28,435,996 

48.65% 

60.45% 

19.58% 

GSM325481 

adipose 

27,752,231 

53.47% 

62.07% 

18.32% 

GSM325482 

testes 

27,303,938 

56.58% 

67.68% 

18.20% 

GSM325486 

breast 

16,120,746 

61.41% 

66.31% 

15.09% 

GSM325483 

lymph  node 

27,492,254 

50.40% 

61 .94% 

13.98% 

GSM343512 

cerebral  cortex 

31,940,303 

68.61% 

62.63% 

9.22% 

GSM343515 

lung 

25,862,064 

62.77% 

59.12% 

8.37% 

GSM343511 

brain 

17,246,964 

57.35% 

62.62% 

6.99% 

In  the  RNA-seq  analysis  we  exeluded  reads  with  more  than  one  hit  in  the  genome  or  in  the  gene  models, 
we  reeovered  reads  with  perfeet  mateh  (no-mismatehes),  then  we  eounted  the  reads  per  kbp  for  eaeh  gene 
model.  Beeause  the  total  number  of  reads  is  variable  in  eaeh  sample,  we  use  a  global  normalization  to 
obtain  standard  values  for  eaeh  gene  model  (Figure  21). 


Raw  RNA-seq 


Norm  RNA-seq 


(a) 


(b) 


Figure  21.  RNA-seq  read  counts  (a)  raw  data  and  (b)  after  global  normalization. 

We  eompared  the  values  for  brain  samples  and  the  global  expression  in  all  the  datasets  (Figure  22  and 
Figure  23). 
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Brain  vs  Cerebral  Cortex 


Brain  vs  Heart 


Figure  22.  RNA-seq  gene  expression  values  for  (a)  brain  and  cerebral  cortex  and  (b)  brain  and  heart.  All  values  are 

log  10  scaled. 


RNA-Seq  samples 


clist(data.t) 
hclust  (*,  "complete") 


Figure  23.  Cluster  dendogram  for  all  the  RNA-seq  datasets. 

We  used  the  same  method  for  brain  speeifieity  in  the  RNA-seq  data,  we  obtained  a  list  of  798  transeripts 
whieh  are  potentially  brain-speeifie. 
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Table  10:  RNA-seq  samples 


NHA 

Normal  astrocytes 

Human 

SOLiD 

50 

15,133,796 

0.76 

H683 

GBM  cell  line 

Human 

SOLiD 

50 

7,693,226 

0.38 

LN18 

GBM  cell  line 

Human 

SOLiD 

50 

13,707,723 

0.69 

LN229 

GBM  cell  line 

Human 

SOLiD 

50 

13,129,539 

0.66 

MJ 

GBM  cell  line 

Human 

SOLiD 

50 

12,756,653 

0.64 

MK 

GBM  cell  line 

Human 

SOLiD 

50 

13,145,984 

0.66 

T98 

GBM  cell  line 

Human 

SOLiD 

50 

2,746,698 

0.14 

U87 

GBM  cell  line 

Human 

SOLiD 

50 

30,998,618 

1.55 

S01 

GBM  tumor 

Human 

Illumina 

75 

22,388,016 

1.68 

S02 

GBM  tumor 

Human 

Illumina 

75 

21,456,618 

1.61 

SOS 

Normal  brain 

Human 

Illumina 

75 

21,258,049 

1.59 

S04 

GBM  tumor 

Human 

Illumina 

75 

22,058,350 

1.65 

S06 

GBM  tumor 

Human 

Illumina 

75 

22,430,149 

1.68 

S07 

GBM  tumor 

Human 

Illumina 

75 

21,513,273 

1.61 

S08 

GBM  tumor 

Human 

Illumina 

75 

21,667,433 

1.63 

B01 

GBM  tumor 

Mouse 

SOLiD 

50 

47,603,332 

2.38 

B02 

GBM  tumor 

Mouse 

SOLiD 

50 

56,857,949 

2.84 

BOS 

GBM  tumor 

Mouse 

SOLiD 

50 

60,794,006 

3.04 

B04 

GBM  tumor 

Mouse 

SOLiD 

50 

54,238,308 

2.71 

BOS 

GBM  tumor 

Mouse 

SOLID 

50 

58,347,341 

2.92 

B06 

GBM  tumor 

Mouse 

SOLiD 

50 

45,976,615 

2.3 

B07 

GBM  tumor 

Mouse 

SOUID 

50 

50,157,968 

2.51 

BOS 

GBM  tumor 

Mouse 

SOLiD 

50 

50,668,644 

2.53 

TOTAL 

686,728,288 

38.16 

We  started  with  publie  data  from  the  original  RNA-seq  reports,  but  now  we  are  ineorporating 
fresh  data  from  our  own  lab  as  well,  in  partieular  from  normal  samples  and  glioblastoma  eell- 
lines  and  tumors.  For  example,  we  integrated  large-seale  data  from  two  different  teehnologies, 
ABI  SOLiD  and  Illumina.  Both  teehnologies  require  similar  approaehes  in  analysis,  but  the 
analysis  and  interpretation  of  RNA-seq  is  a  eurrent  ehallenge  for  bioinformaties  due  the  quantity 
of  data  (on  the  order  of  several  GB  per  run),  event  deteetion  (e.g.  exon  expression,  exon-junetions 
deteetion),  normalization  between  samples  and  teehnieal  error  deteetion.  Currently,  we  are  in  the 
evaluation  and  testing  of  some  of  the  most  reeent  tools  for  RNA-seq  analysis  and  in  partieular 
with  alternative  isoform  deteetion. 
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Figure  24;  Expression  levels  for  gene  PKM2  in  the  SOLiD  RNA-seq  samples 
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2,14,  Quantification  of  gene  expression  in  human  brain 

We  focused  on  the  integration  of  other  sources  for  gene  expression  quantification  like  RNA-seq 
technology  (Figure  25  and  Figure  26).  We  developed  a  general  pipeline  to  analyze  this  type  of  data 
independently  of  the  technology  (Illumina  or  ABI/SOLiD).  We  also  expanded  our  brain  expression 
data  sets  with  data  from  Illumina  Inc.,  well  have  experimental  data  from  normal  brain,  and  other 
tissues  in  high  coverage  with  single  and  paired  reads. 


Figure  25:  General  view  of  the  RNA-seq  pipeline,  mapping  modules 


Figure  26:  General  view  of  the  mining  modules  of  the  RNA-seq  pipeline 
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3-  Key  Research  Accomplishments 


3.1.  Biological  investigation  of  EigenBrain  image  for  each  cell  type 

>  Oligodendrocytes 

Oligodendrocytes  cells  are  responsible  for  the  insulation  of  axons  in  the  central  nervous  system. 
Figure  27  and  Figure  28  shows  the  highly  expressed  brain  regions  in  the  EigenBrains  computed 
from  the  Oligodendrocyte-specific  markers  for  both  coronal  section  and  sagittal  section. 


Figure  28:  EigenBrain  image  for  Astrocytes  enriched  genes  in  sagittal  section 
>  Astrocytes 

Astrocytes  are  known  for  providing  the  critical  role  of  biochemical  support  to  endothelial  cells 
that  form  the  blood-brain  barrier.  They  also  provide  nutrients  to  the  nervous  tissue,  and  help  in 
the  repair  and  scarring  process  of  the  brain  following  traumatic  injuries.  The  EigenBrain  images 
in  Figure  29  and  Figure  30  also  show  the  highly  expressed  brain  regions  whose  roles  are  closely 
related  with  astrocytes.  The  circled  regions  in  Figure  8  indicate  the  OCH  (Optic  Chiasm)  region 
allowing  for  the  right  visual  field  to  be  process,  the  HY  (Hypothalamus)  region  linking  the 
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nervous  system  to  the  endoerine  system  via  the  pituitary  gland,  and  the  CP  (Caudoputamen) 
region  relating  to  eognition  and  working  memory,  respeetively. 


Figure  29:  EigenBrain  Image  for  Astrocytes  enriched  genes  in  coronal  section 

Figure  30  also  reveals  highly  expressed  regions  in  sagittal  seetion  of  astroeyte-enriehed  genes  : 
CTX  (Cerebral  Cortex)  and  MDRN  (Medullary  Retrieular  Nueleus).  The  CTX  region  is  a  sheet 
of  neural  tissue  that  is  outermost  to  the  eerebrum  of  the  mammalian  brain  and  plays  a  role  in 
memory,  attention,  pereeptual  awareness,  thought,  language,  and  eonseiousness.  The  MDRN 
region  is  also  responsible  for  eontrolling  several  major  autonomie  funetions  of  the  body. 


Figure  30:  EigenBrain  Image  for  Astrocytes  enriched  genes  in  sagittal  section 


>  Neurons 


Neuron  eells  are  eore  eomponents  of  the  nervous  system.  Our  EigenBrain  approaeh  displayed  the 
biologieally  meaningful  patterns  in  different  brain  regions.  Figure  31  and  Figure  32  show  the 
highly  expressed  brain  region  for  neuron  eells  in  eoronal  and  sagittal  seetions.  In  Figure  31, 
marked  regions  are  revealed  as  representative  regions  sueh  as  VS  (ventrieular  systems),  CTX 
(eerebral  eortex),  HPF  (Hippoeampus  Formation),  PIR  (piriform  eortex),  and  TH  (Thalamus). 
More  speeifieally,  the  VS  region  is  usually  inereased  with  age  and  enlarged  in  a  number  of 
neurologieal  eonditions.  The  HPF  region  has  key  role  in  long  term  memory  and  spatial 
navigation  and  in  Alzheimer's  disease,  and  this  region  is  often  one  of  the  first  regions  of  the  brain 
to  suffer  damage.  The  PIR  and  TH  regions  are  related  with  sensory  system,  story  linking  image 
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and  smell,  and  responsible  for  the  regulation  of  eonseiousness,  sleep  and  alertness,  respeetively. 
These  diseovered  regions  are  eonsistent  with  the  ones  from  sagittal  regions  (Figure  32). 


Figure  32:  EigenBrain  Image  for  Neuron  enriched  genes  in  sagittal  section 

3-2,  More  investigation  about  EigenBrain  approach 

3.2.1.  Symmetry  pattern  in  EigenBrain  image 

In  order  to  eheek  whether  the  symmetry  expression  patterns  in  EigenBrain  images  eome  from  the 
bias  of  EigenBrain  approaeh  itself  or  not,  we  applied  the  following  test:  all  original  images  were 
rotated  into  45  degree  and  we  applied  EigenBrain  approaeh  to  see  whether  the  symmetry  in  gene 
expression  eomes  from  the  bias  of  the  method  itself  or  not.  As  ean  be  seen  from  Figure  33,  even 
though  we  applied  the  rotations  to  the  original  images,  the  symmetry  expression  patterns  in 
EigenBrain  images  ean  be  deteeted,  whieh  means  those  symmetries  were  not  the  bias  of  method 
itself 
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Figure  33:  EigenBrain  image  from  rotated  brain  images 

3.2.2.  Applying  the  EigenBrain  approach  to  the  test  dataset 

The  EigenBrain  approach  has  been  applied  to  the  large  volume  of  test  data  (Table  11).  As  can  be 
seen,  the  number  of  in  situ  hybridization  (ISH)  brain  images  are  larger  than  the  number  of  genes, 
which  means  that  usually  each  gene  has  1  or  2  corresponding  brain  images. 

Table  11:  Statistics  of  Test  dataset 


#  of  image  files 

#  of  genes 

Coronal  section 

7340 

4034 

Sagittal  section 

29110 

19446 

We  applied  our  EigenBrain  approach  to  the  test  dataset  to  identify  the  cell  type  specific  genes. 
These  genes  are  discovered  from  both  coronal  and  sagittal  sections  as  cell-type  specific  genes. 
We  included  a  subset  of  the  candidate  cell-type  specific  genes  in  Table  12. 


Table  12:  Candidate  cell  type  specific  gene 


#  of  candidate  cell 
type  specific  genes 

Candidate  cell  type  specific  gene  list 

Oligodendrocytes 

37 

Adamts4,  Anln,  Arrdc3,  BC030477,  Car2,  Cldnl  1, 
Edg2,  Efhb3,  Elovll,  Enpp6,  Fa2h,  Galnt6,  Gatm,  etc 

Astrocytes 

23 

Acaa2,  A1987712,  C230095G01Rik,  Capsl,  Cldnl, 
Decrl,  Dip3b,  Dlx6osl,  E030013G06Rik,  etc 

Neuron 

363 

170001 0C24Rik,  1 700020C 1 1  Rik, 

20 1 0004A03Rik,A8300 1 8L 1 6Rik,  A93004 1102Rik, 
Aacs,  Abhd6,  Adcyl,  Adcy9,  Ap3sl,  Arf3  etc 

3.3.  Identification  of  candidate  cell  type  specific  genes 

3.3.1.  Oligodendrocytes 
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From  applying  our  algorithm  to  the  analysis  of  the  brain  images,  we  identified  EFNB3  as  one  of 
our  eandidate  oligodendrotye  enriehed  genes  (Figure  34).  As  ean  be  seen  from  Figure  34,  these 
figures  reveal  the  highly  expressed  patterns  espeeially  in  Alveus  and  Fimbria,  whieh  are  known  as 
important  regions  of  heavy  oligodendoroeyte  expression  in  the  eell  [16].  In  [3]  also  eonfirmed  the 
eonsistent  result  though  the  mieroarray  expression  data.  EFNB3  is  a  member  of  the  ephrin  gene 
family  and  very  important  in  brain  development  as  well  as  its  maintenanee,  partieularly  in  the 
nervous  system.  The  right  panel  in  Figure  34  shows  the  related  GO  information  with  gene 
EFNB3.  “Axon  Guidanee”  proeess  is  deteeted  as  one  of  main  GO  proeesses  enriehed  for  the 
oligodendroeyte  eell. 


7!  Alveus 

/ 


Fimbria 


Patfiways  and  Processes 

GeneGo  Process  Networks 

N9 

1  Cell  adhe‘;iori  Atrtr.birb  'e  and  repulsive  recepbjr;. 

2  Cell  adhesion  Synaptic  contact 

3  Development  Neurooenesis;  Axonal  guidance 

4  Development  Neurooenesis ;  5 vnaptoaenesis 

GO  Processes 

1  adult  walking  behavior 

2  axon  choice  point  recognition 

3  axon  guidance 

4  cell  diPPerentiation 

5  cell-cell  signaling 

6  ephrin  receptor  signaling  pathway 

7  interi;pecies  interaction  between  organisms 

8  multicellular  orqanismal  development 

9  nervous  system  development 

GO  Molecular  Functions 

N9 

1  ephrin  receptor  binding 

2  transmembrane-ephrin  receptor  activity 


Figure  34:  ISH  brain  image  and  corresponding  expression  image  for  oligodendorocytes  specific  gene  :  EFNB3 
(Ephrin-B3)  and  its'  related  Gene  Ontology  pathway  and  process 

Figure  35  shows  the  enriehment  of  GO  eategories  for  eandidate  ologodendroeyte  speeifie  genes. 
Axon  eseheatment,  eseheatment  of  neurons,  myelination,  and  lipid  biosynthetie  proeess  are 
identified  as  a  eritieal  pathways  with  P-value  <0.01.  These  GO  eategories  also  have  been  known 
as  a  major  funetionalities  for  oligodendroeyte  eell  though  the  literature  [3]. 
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GO  ID 

TtRM 

NB IN REF 

FREQ 1N REF 

NBJN SET 

FRECUN.SET 

P VALUE 

6ENESJN SET 

G0:0008366 

axon  ensheathment 

33 

0.0023 

3 

0.1034 

1  3.60E-05 

IVIbp.Cldnll,Ugt8a 

G0:0007272 

ensheathment  of  neurons 

33 

0.0023 

3 

0.1034 

3.60E-05 

Mbp.Cldnll.UgtSa 

G0:0019228 

regulation  of  action  potential  in  neuron 

37 

0.0025 

3 

0.1034 

,  5.09E-05 

Mbp,Cldnll.Ugt8a 

G0.0001508 

regulation  of  action  potential 

45 

0.0031 

3 

0.1034 

I  9.17E-05 

Mbp.Cldnll.UgtSa 

G0:0006633 

fatty  acid  biosynthetic  process 

86 

0.0059 

3 

0.1034 

0.000615 

Elovll.Fa2h.Ptgds 

G0:00i6053 

organic  acid  biosynthetic  process 

93 

0.0063 

3 

0.1034 

0.00077 

Elovll.Fa2h.Ptgds 

GO:0046394 

carboxylic  acid  biosynthetic  process 

93 

0.0063 

3 

0.1034 

!  0.00077 

Elovll,Fa2h,Ptgds 

G0:0042391 

regulation  of  membrane  potential 

105 

0.0072 

3 

0.1034 

0.001088 

Mbp,Cldnll.Ugt8a 

GO;0042552 

myelination 

31 

0.0021 

2 

0.069 

0.001666 

Mbp.UgtSa 

G0:0008610 

lip:d  biosynthetic  process 

276 

0.0188 

4 

0.1379 

¥ 

GO 

8 

d 

Elovll,Fa2h,Ftgds,Ugt8a 

G0:0015670 

carbon  dioxide  transport 

1 

0.0001 

1 

0.0345 

0.001978 

Car2 

GO.-0007399 

nervous  system  development 

797 

0.0544 

6 

0.2069 

:  0.00336 

Mbp.Sema6a,Cidnll,SoxlO.Efnb3,Ugt8a 

G0:0022410 

circadian  sleep/wake  cycle  process 

2 

0.0001 

1 

0.0345 

0.003949 

Ptgds 

G0:0050802 

circadian  sieep/wake  cycle,  sleep 

2 

0.0001 

1 

0.0345 

0.003949 

Ptgds 

G0:0006601 

creatine  biosynthetic  process 

2 

0.0001 

1 

0.0345 

0.003949 

Gatm 

G0:0042749 

regulation  of  circadian  sleep/wake  cycle 

2 

0.0001 

1 

0.0345 

0.003949 

Ptgds 

GO;0019695 

choline  metabolic  process 

2 

0.0001 

1 

0.0345 

0.003949 

Enpp6 

G0:0045187 

regulation  of  circadian  sleep/wake  cycle,  sleep 

2 

0.0001 

1 

0.0345 

0.003949 

Ptgds 

G0:0006643 

membrane  lipid  metabolic  process 

57 

0.0039 

2 

0.069 

0.005449 

Fa2h,Ugt8a 

!gO;0006631 

fatty  acid  metabolic  process 

188 

0.0128 

3 

0.1034 

0.005451 

Elovll,Fa2h, Ptgds 

G0:0006600 

creatine  metabolic  process 

3 

0.0002 

1 

0.0345 

0.005912 

Gatm 

G0:0042745 

circadian  sieep/wake  cycle 

3 

0.0002 

1 

0.0345 

0.005912 

Ptgds 

GO:0042396 

phosphagen  biosynthetic  process 

3 

0.0002 

1 

0.0345 

0.005912 

Gatm 

G0:0006629 

lipid  metabolic  process 

664 

0.0453 

5 

0.1724 

0.007391 

Elovll.Enpp6.Fa2h.Ptgds.Ugt8a 

GO;0051642 

centrosome  localization 

4 

0.0003 

1 

0.0345 

0.00786S 

5ema6a 

G0:0016198 

axon  choice  point  recognition 

4 

0.0003 

1 

0.0345 

0.007868 

Efnb3 

G0:0006873 

cellular  ion  homeostasis 

218 

0.0149 

3 

0.1034 

0.008071 

Mbp.Cldnll.UgtSa 

,GO:0055082 

cellular  chemical  homeostasis 

225 

0.0153 

3 

0.1034 

0.008766 

Mbp.Cldnll.UgtSa 

G0:0019752 

carboxylic  acid  metabolic  process 

441 

0.0301 

4 

0.1379 

0.009002 

Gatm.E;ov!l.Fa2h. Ptgds 

G0:0006082 

organic  acid  metabolic  process 

442 

0.0302 

4 

0.1379 

0.009068 

Gatm.E!ovil.Fa2h.Ptgds 

G0:0019226 

transmission  of  nerve  impulse 

233 

0.0159 

3 

0.1034 

0.009599 

Mbp.Cldnll.UgtSa 

G0:0042752 

regulation  of  circadian  rhythm 

5 

0.0003 

1 

0.0345 

0.009816 

Ptgds 

G0;0048512 

circadian  behavior 

5 

0.0003 

1 

0.0345 

0.009816 

Ptgds 

G0:0030431 

sleep 

5 

0.0003 

1 

0.0345 

0.009816 

Ptgds 

G0:0048484 

enteric  nervous  system  development 

5 

0.0003 

1 

0.0345 

0.009816 

SoxlO 

,GO:0006599 

phosphagen  metabolic  process 

5 

0.0003 

1 

0.0345 

0.009816 

Gatm 

G0:0042439 

ethanotamine  and  derivative  metabolic  process 

5 

0.0003 

1 

0.0345 

0.009816 

Enpp6 

G0:0032787 

monocarboxyl ic  acid  metabolic  process 

248 

0.0169 

3 

0.1034 

0.011275 

Elovll,Fa2h.Ptgds 

G0:0050801 

ion  homeostasis 

248 

0.0169 

3 

0.1034 

0.011275 

Mbp.Cldnll.UgtSa 

G0:004a013 

ephrin  receptor  signaling  pathway 

6 

0.0004 

1 

0.0345 

0.011757 

Efnb3 

G0;0009247 

glyco'ipid  biosynthetic  process 

7 

0.0005 

1 

0.0345 

0.01369 

UgtSa 

G0:0007622 

rhythmic  behavior 

7 

0.0005 

1 

0.0345 

0.01369 

Ptgds 

G0:0007411 

axon  guidance 

100 

0.0068 

2 

0.069 

0.015604 

Sema6a,Efnb3 

,GO:0006575 

cellular  amino  acid  derivative  metabolic  process 

102 

0.007 

2 

0.069 

0.016178 

Gatm,Enpp6 

G0:0019725 

cellular  homeostasis 

293 

0.02 

3 

0.1034 

0.017172 

Mbp.Cldnll.UgtSa 

G0:0050910 

detection  of  mechanical  stimulus  involved  in  sens 

9 

0.0006 

1 

0.0345 

0.017534 

Slcl2a2 

GO:0051592 

response  to  calcium  ion 

9 

0.0006 

1 

0.0345 

0.017534 

S100al6 

Figure  35:  Partial  result  for  gene  enrichment  test  of  GO  category  for  candidate  oligodendrocyte  specific  genes 


We  also  identified  relevant  pathways  in  whieh  eaeh  of  the  eandidate  oligodendroeyte  eells  were 
involved.  As  we  ean  see  from  Figure  36,  axon  guidanee  pathway  was  again  deteeted  as  a  highly 
relevant  pathway  with  oligodendroeyte  eell  speeifie  genes.  Currently,  we  are  still  investigating 
other  pathways  or  GO  fiinetions. 


Rank  in  list 

Symbol  in  list 

Symbol  in  iMttnmy 

Paftmmys 

'5 

Car2 

CA2 

NITROGEILMETABOLISM 

'6 

Cldnil 

CLDN11 

HSA04670 LEU  KO  C'rTEJTRANSENDOTHELlALJ.I  IGRATION 

7 

Edg2 

EDG2 

SMOOTH  J.1USCLE C0NTRACT10N 

8 

Efnb3 

EFNB3 

HSAO436O.AXON GU0ANCE 

10 

Enpp6 

ENPP6 

HSA00565 ETHER 1JPID METABOIJSM 

12 

Gainte 

GALNT6 

0 GLYCAN BI0SYNTHESIS 

13 

Gatm 

GATM 

UREA CVCLE AND METABOLISM OF AMINO GROUPS 

14 

Gpr37 

GPR37 

PARKINPATHWAY 

17 

Map2k'6 

MAP2K6 

TOLLPATHWAY 

19 

Ubp 

MBL2 

HSA04610 COMPL£MENT AND COAGULATlON CASCADES 

26 

Plxnb3 

PLXNB3 

HSAO43€O AXON GU0ANCE 

27 

Rgds 

PTGDS 

PR0STAGLANDIN SYNTHESIS REGULAT10N 

29 

Sema6a 

SEMA6A 

HSAOA36Q AXON GUOANCE 

36 

UncSb 

UNCSB 

HSAO436Q AXON GU0ANCE 

37 

Vegfb 

VEGFB 

HSA0521 9 BLADDER CANCER 

Figure  36:  Pathway  analysis  for  candidate  oligodendrocyte  specific  genes 
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3.3.2.  Astrocytes 


Figure  37  shows  ISH  brain  image  and  eorresponding  expression  image  of  the  gene  Acaa2 
(acetype-Coenzyme  A  acyltransf erase  2).  This  gene  is  identified  as  one  of  the  eandidate 
astroeytes-  speeifie  genes  by  our  EigenBrain  approaeh.  As  we  ean  see,  the  wall  of  the  ventriele 
region  is  very  elearly  expressed  in  both  seetions  -  a  elear  indieation  of  astroeyte  speeifie  genes. 
Acaa2  eneodes  protein  eatalyzing  the  last  step  of  the  mitoehondrial  fatty  aeid  beta-oxidation 
spiral  and  this  fiinetion  is  also  eonfirmed  from  Gene  Ontology  proeesses  revealing  fatty  aeid 
metabolie  proeess  as  a  major  related  proeess  [3].  Additionally,  we  found  that  this  gene  is  related 
with  leueune,  isoleueine  and  valine  metabolism  as  might  be  expeeted  in  astroeyte  eells. 


Pathways  and  Processes 


Figure  37:  ISH  brain  image  and  corresponding  expression  ii 
Coenzyme  A  acyltransferase  2)  and  its’  related 


GeneGo  Pathway  Maps 

N9 

1  Bile  Acid  Biosynthesis 

2  Bile  Acid  Biosynthesis  I  Rodent  version 

3  Butanoate  metabolism 

4  Leueune,  isoleueine  and  valine  metabolism. p .2 

5  Leueune,  isoleueine  and  valine  metabolism/  RodenL  vert-ion 

6  Lvsine  metabolism 

7  Lvsine  metabolism/  Rodent  version 

8  Mitochondrial  ketone  bodies  biosynthesis  and  meLabolism 

9  Mitochondrial  long  chain  fatty  acid  beta-oxidation 

10  Mitochondrial  unsaturated  Fatty  acid  beta-oxidabon 

11  Peroxisomal  branched  chain  fatty  acid  oxidation 

12  Phenylalanine  metabolism 

13  Phenylalanine  metabolism/  Rodent  version 

14  Propionate  metabolism  d.2 

15  Tryptophan  metabolism 

16  Tryptophan  metabolism/  Rodent  version 

17  Tyrosine  metabolism  p.2  fmelanin) _ 

GO  Processes 

N9  Name 

1  cholesterol  biosynthetic  process 

2  fatty  acid  metabolic  process 

3  lipid  metabolic  process 

4  metabolic  process 

uene  Ontology  patnway  ana  process 


Figure  38  shows  the  highly  enriehed  gene  ontology  terms  among  eandidate  astroeyte  speeifie 
genes.  With  P-value  <0.01,  there  were  19  GO  terms  potentially  related  with  astroeyte  eells.  We 
are  planning  to  investigate  more  about  these  GO  terms  to  diseover  the  more  biologieal  insights  in 
the  eoming  quarter. 
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«)JD 

TERM 

NBJN REF 

FREa-IN REF 

NBJN SET 

FREa-IN SET 

P VALUE 

6ENESJN SET 

G0:0050878 

regulation  of  body  fluid  levels 

88 

0.006 

3 

0.1875 

0.000204 

Trp73,F5,F3 

G0;0042060 

wound  healing 

98 

0.0067 

3 

0.1875 

0.000277 

Gjai,F5,F3 

GO:0009611 

response  to  wounding 

341 

0.0233 

4 

0.25 

0.00099 

Gial,Trp73,F5.F3 

GO;0033326 

cerebrospinal  fluid  secretion 

1 

0.0001 

1 

0.0625 

0.002315 

Trp73 

G0;0007596 

blood  coagulation 

69 
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Figure  38:  Partial  result  for  gene  enrichment  test  of  GO  category  for  candidate  astrocyte  specific  genes 

Figure  39  shows  the  eritieal  pathways  related  with  eandidate  astroeytes  speeifie  genes.  Valine 
leueine  and  isoleueine  degradation  pathway  is  also  known  to  play  a  eritieal  role  in  astroeytes  [3]. 
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HSA0451  2 ECM RECEPT0  R INTTRACT10  M 

Figure  39:  Pathway  analysis  for  candidate  astrocyte  specific  genes 

3.3.3.  Neurons 


A  candidate  neuron-specific  gene  we  have  identified,  is  Grinl  (glutamate  receptor,  ionotropic, 
NMDAl(zeta  1)),  shown  in  Figure  18.  In  particular,  we  see  high  expression  in  a  “G-shaped” 
region  (Hippocampal  subfields:  CAl,  CA2,  CA3  and  DG  (dentate  gyms))  in  the  sagittal  cross- 
section.  Grin  7  is  a  NMDA  receptor  subtype  of  glutamate -gated  ion  channels  and  possesses  high 
calcium  permeability  and  voltage-dependent  sensitivity  to  magnesium.  In  particular,  it  encodes 
for  a  protein  that  plays  a  role  in  synaptic  plasticity,  synaptogenesis,  excitotoxicity,  memory 
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acquisition  and  learning.  This  function  also  can  be  confirmed  from  GO  enrichment  analysis 
(Figure  40). 


CA2  CA1 
CA3  ^  DG 


GeneGo  Process  Networks 

N9  Name 

1  Cell  adhesion  Svrtaotic  contact 

2  Development  Neurooenesis :  Synaptooenesis 

3  NeurophvsiQloQical  process  Circadian  rhythm 

4  Neurophysiological  process  GABAeroic  neurotransmission 

5  Neurophysiological  process  Long-term  potentiation 

6  Neurophysiological  process  Transmission  oP  nerve  impulse 

7  Transport  Calcium  transport 

8  Transport  Manganese  transport 

GO  Processes 

N9 

1  adult  locomotorv  behavior 

2  associative  learning 

3  calcium  ion  homeostasis 

4  calcium  ion  transport 

5  cation  transport 

6  cellular  calcium  ion  homeostasis 

7  cerebral  cortex  development 

8  conditioned  taste  aversion 

9  ion  transport 


Figure  40:  ISH  brain  image  and  corresponding  expression  image  for  neuron  specific  gene:  Grinl  (glutamate 
receptor,  ionotropic,  NMDAl(zeta  1))  and  its’  related  Gene  Ontology  pathway  and  process 


From  this  data,  we  identified  many  signifieant  GO  eategories  that  were  enriehed  in  neuron- 
speeifie  genes  sueh  as  synaptie  transmission,  regulation  of  synaptie  plastieity,  and 
neurotransmitter  transport.  Figure  41  lists  a  subset  of  the  enriehed  GO  terms  and  biologieal 
validation  is  needed  for  further  analyses. 
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eO_H)  TERM  NBJN_REF  FRECLIN.REF  NBJN_SETFRECLIN_SETP_VALUE  QENESJN.SET 


G0;0051179  localization 

2671 

0.1822 

103 

0.3962 

2.16E-16  Rbp4,Kcnk2,Kcnfl,lcal,Slc2a3,Slit3,Vegfa,NdufalO,Got2,Kif5c.Sh3gl2,Slc36a2,Osbpl8.G 

G0:0006810  transport 

2263 

0.1544 

91 

0.35 

2.42E-15  Rbp4.Kcnk2.Kcnfl,lcal.Slc2a3.NdufalO,Got2.Sh3gl2,Slc36a2,Osbpl8.Grik2,Kpnal,Rab3i 

G0;0051234  establishment  of  localization 

2276 

0.1553 

91 

0.35 

3.44E-15  Rbp4,Kcnk2,Kcnfl,!cal,Slc2a3,Ndufal0.Got2,Sh3gl2,Slc36a2.Osbpl8,Grik2,Kpnal,Rab3i 

G0:0007268  synaptic  transmission 

194 

0.0132 

20 

0.0769 

2.14E-10  lcal,Grik2,Sv2b.Snap25,Pclo,Gabrg2.Egr3,Nrxnl,Grinl,Gipcl.Slc24a2,Park2,Slcl7a7.Svi 

G0;0032940  secretion  by  cell 

217 

0.0148 

20 

0.0769 

1.53E-09  Rbp4.lcal.Rab3c.Sv2b,Scrnl,Snap25,Pclo.Scg5,Nrxnl,Gipcl.Pfkl.Doc2a,Park2,Rims3,Vgi 

G0;0006836  neurotransmitter  transport 

83 

0.0057 

13 

0.05 

2.04E-09  lcal,Sv2b.5nap25,Pclo,Nrxnl,Slc6a7,Slc32al.Park2.Slcl7a7.Rims3.Svn2.Stxla,Nrxn3 

G0:0007267  cell-cell  signaling 

353 

0.0241 

25 

0.0962 

3.21E-09  lcal,Grik2,Sv2b,5nap25,Pclo,Gabrg2,Egr3,Scg5.Nrxnl,Grinl,Gipcl,Pfkl.Slc24a2.Park2.SI 

G0;0019226  transmission  of  nerve  impulse 

233 

0.0159 

20 

0.0769 

5.17E-09  lcal,Grik2,Sv2b,5nap25,Pclo,Gabrg2,Egr3,Nrxnl,Grinl.Gipcl,Slc24a2,Park2,Slcl7a7.Syi 

G0;0003001  generation  of  a  signal  involved  in  cell-cell  signaling 

111 

0.0076 

14 

0.0538 

8.69E-09  lcal,Sv2b,Snap25.Pclo,Scg5,Nrxnl,Gipcl,Pfkl,Park2.Vgf,Pfkm,Syn2,Rapgef4,Nrxn3 

GO:OCX>6811 

ion  transport 

682 

0.0465 

35 

0.1346 

1.03E-08  Kcnk2,Kcnfl,Slc36a2.Grik2,Kcnn2,Kcnmb4,Gabrb2,Kcnj4,Slc9al.Slcl2a3.Gabrg2.Kcnip= 

G0;0046903 

secretion 

248 

0.0169 

20 

0.0769 

1.47E-08  Rbp4,lcal,Rab3c,Sv2b,Scrnl.Snap25,Pclo,Scg5,Nrxnl,Gipcl,Pfkl,Doc2a.Park2,Rims3.Vgl 

G0:0001505  regulation  of  neurotransmitter  levels 

67 

0.0046 

10 

0.0385 

2.41E-07  lcal.Sv2b,Snap25,Pclo,Nrxnl,Park2.Slcl7a7,Syn2,Gad2,Nrxn3 

G0:0007269  neurotransmitter  secretion 

42 

0.0029 

8 

0.0308 

5.76E-07  ical,Sv2b.5nap25.Pclo,Nrxnl.Park2,Syn2,Nrxn3 

G0:0015672  monovalent  inorganic  cation  transport 

297 

0.0203 

19 

0.0731  J 

\  1.15E-06  Kcnk2,Kcnfl.Slc36a2,Kcnn2,Kcnmb4,Kcnj4.Slc9al.Slcl2a3,Kcnip3,Kcns2,Kctdl,5!cl7a7 

G0:0030001  metal  ion  transport 

431 

0.0294 

23 

0.0885 , 

j  1.92E-06  Kcnk2,Kcnfl.Kcnn2,Kcnmh4,Kcn]4,Slc9al.Slcl2a3,Kcnip3,Kcns2,Grinl,Slc24a2,Kctdl.SI 

G0:0007214  gamma-aminobutyric  acid  signaling  pathway 

15 

0.001 

5 

0.0192 1 

1  4.26E*06  GsbrgZ.GsbrdS^Gsbrsl.GdbrbSjGdbrsS 

G0;0006812  cation  transport 

499 

0.034 

24 

0.09231 

1  651E-06  Kcnk2,Kcnfl,5ic36a2.Kcnn2.Kcnmb4,Kcnj4.Slc9al.5lcl2a3,Kcnip3,Kcns2,Grinl.Slc24a2 

G0;0(X>6813  potassium  ion  transport 

151 

0.0103 

12 

0.04621 

a  1.30E-05  Kcnk2,Kcnfl.Kcnn2.Kcnmb4,Kcnj4,Kcnip3,Kcns2.Kctdl.Kcnip4.Kcnq3.Hcnl.Kctdl6 

G0;0016311  dephosphoryiation 

127 

0.0087 

11 

0.0423 1 

1.34E-05  Rprj,Duspl,Ppm2c,Dusp6,Ppmll,Ptpn5,Mtmrl2,Ptprk,Ptprs,Mtmr7.Ppp2ca 

G0:0051046 

regulation  of  secretion 

95 

0.0065 

9 

0.0346  i 

!  4.01E-05  Rbp4,lcal,Rab3c,Pclo,Scg5.Pfkl,Park2,Pfkm,Rapgef4 

G0;0006470  protein  amino  acid  dephosphoryiation 

103 

0.007 

9 

0.0346! 

1  7.45E-05  Rpr],Duspl,Ppm2c,Dusp6,Ppmll,Ptpn5,Ptprk,Ptprs,Ppp2ca 

G0.0050804  regulation  of  synaptic  transmission 

68 

0.0046 

7 

0.0269* 

0.0001704  lcal.Grik2,Grinl,Gipcl,Slc24a2,Park2,Cpebl 

G0;0051649  establishment  of  localization  in  cell 

626 

0.0427 

24 

0.0923 1 

0.0001976  Rbp4,lcal,Grik2,Kpnal.Rab3c.lpo4,Sv2b,Scrnl.Snap25,Pclo.Scg5,Nrxnl,Gipcl.Pfkl.Doc2 

G0:0051641  cellular  localization 

666 

0.0454 

25 

0.0962; 

0.0002005  Rbp4.lcal.Grik2.Kpnal,Rab3c.lpo4,Sv2b,Scrnl,Snap25,Pclo,Plxna2,Scg5,Nrxnl,Gipcl.Pf 

G0:0006887  exocytosis 

94 

0.0064 

8 

0.0308; 

0.0002194  Rab3c.Sv2b,Scrnl.Pclo,Doc2a,Rims3,Rapgef4.Stxla 

G0;0051969  regulation  of  transmission  of  nerve  impulse 

71 

0.0048 

7 

0.0269 ' 

0.0002218  lcal.Grik2,Grinl.Gipcl.Slc24a2,Park2,Cpebl 

G0;0031644  regulation  of  neurological  system  process 

75 

0.0051 

7 

0.0269! 

'  0.0003087  lcal,Grik2,Grinl,Gipcl,Slc24a2.Park2,Cpebl 

G0:0048167 

regulation  of  synaptic  plasticity 

35 

0.0024 

5 

0.0192 

0.0003238  Grik2,Grinl.Gipcl,Slc24a2,Cpebl 

G0;0046879  hormone  secretion 

60 

0.0041 

6 

0.0231 

'  0.0005716  Pclo.Scg5,Pfkl,Vgf,Pfkm,Rapgef4 

G0:0016486  peptide  hormone  processing 

10 

0.0007 

3 

0.0115 

'  0.0005849  Pcsk5.Scg5,Pcsk2 

G0;0031175  neuron  projection  development 

230 

0.0157 

12 

0.0462 

0.0006042  Slit3,Mtap2,Kif5c.Grinl.Nefl,Slitrkl.Nml,Stmnl,Fezf2,Pakl.Cck,Ntng2 

G0:0009914 

hormone  transport 

61 

0.0042 

6 

0.0231 

0.000623  Pclo,Scg5,Pfkl,Vgf,Pfkm,Rapgef4 

G0;0051049 

regulation  of  transport 

201 

0.0137 

11 

0.0423 

0.000684  Rbp4,lcal,Rab3c,Pclo,Scg5,Pfkl,Park2.Nedd4l,Pacsinl.Pfkm,Rapgef4 

G0:0030073 

insulin  secretion 

43 

0.0029 

5 

0.0192 

0.0008341  Pclo.Pfkl,Vgf.Pfkm,Rapgef4 

G0:0016192 

vesicle-mediated  transport 

419 

0.0286 

17 

0.0654 1 

'  0.0008385  Sh3gl2,Rab3c,Sv2b.Elmol,Scrnl,Pclo.Sorll,Arf3.Doc2a.Rims3.lcam5,Gata2.Pacsinl,Rin; 

G0;0065008 

regulation  of  biological  quality 

1045 

0.0713 

32 

0.1231 1 

'  0.0008575  Rbp4.Kcnk2,lcal,Vegfa,Grik2,Pcsk5,Sv2b,Gucyla3.Snap25.Slc9al.Pclo,Scg5,Nrxnl,Tmsh 

G0;0031111 

negative  regulation  of  microtubule  polymerization  o 

12 

0.0008 

3 

0.0115 

j  0.001035  Mtap2,Mapt5tmnl 

G0;0010817 

regulation  of  hormone  levels 

152 

0.0104 

9 

0.03461 

0.0011845  Rbp4,Pcsk5,Pclo.Scg5.Pfki,Vgf,Pfkm,Rapgef4,Pcsk2 

G0;0006796 

phosphate  metabolic  process 

938 

0.064 

29 

0.1115  j 

\  0.001242  Ptprj,Duspl.Ppm2c,Mtap2,Dusp6,Uqcrh,Erbb4.Ppmll.Ptpn5,Mtmrl2,Nlk,Ptprk,Pak7.Rhc 

G0:0006793 

phosphorus  metabolic  process 

938 

0.064 

29 

0.1115  j 

1  0.001242  Ptprj,Duspl,Ppm2c,Mtap2,Dusp6.Uqcrh.Erbb4.Ppmll.Ptpn5,Mtmrl2,Nlk,Ptprk,Pak7,Rhc 

G0;0044057 

regulation  of  system  process 

154 

0.0105 

9 

0.0346; 

,  0.0012908  lcal,Grik2,Gucyla3.Grinl,Gipcl.Slc24a2,Park2,Thrb,Cpebl 

GO;OC)07019 

microtubule  depolymerization 

13 

0.0009 

3 

0.0115 1 

'  0.0013218  Mtap2,Mapt.Stmnl 

G0;0007409 

axonogenesis 

187 

0.0128 

10 

0.0385 ! 

'  0.0013645  Slit3.Kif5c,Grinl,Nefl.Slitrkl,Nrnl,Stmnl.Fezf2,Cck,Ntng2 

G0:0030072 

peptide  hormone  secretion 

49 

0.0033 

5 

0.0192 1 

0.0014867  Pclo,Pfkl,Vgf,Pfkm,Rapgef4 

G0:0018107 

peptidyl-threonine  phosphorylation 

14 

0.001 

3 

0.0115; 

0.0016528  Mtap2.Nlk,Mapk8 

G0;0018210  peptidyl-threonine  modification 

14 

0.001 

3 

0.0115  j 

'  0.0016528  Mtap2,Nlk,Mapk8 

G0;0006091  generation  of  precursor  metabolites  and  energy 

261 

0.0178 

12 

0.0462 

'  0.0016662  NdufalO.DlstUqcrh.Ndufs2.Pfkl.Ndufv2.Vef.Uqcr.Pfkm.Txn2.Pfkp.Sdhb 

Figure  41 :  Partial  result  for  gene  enrichment  test  of  GO  category  for  candidate  neuron  specific  genes 


Figure  42  describes  the  pathways  where  neuron  enriched  genes  are  belonging.  Red  marked 
pathways  such  as  calcium  signaling  pathway,  mapK  signaling  pathway,  Gaba  pathway,  and  long¬ 
term  depression  pathways  etc.,  are  confirmed  pathways  through  the  literatures. 
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Figure  42:  Pathway  analysis  for  candidate  neuron  specific  genes 


3-4,  Differentially  expressed  region  for  each  cell  type 

Figure  43  shows  results  of  unpaired  t-test  between  different  eell  type  speeifie  genes  to  indentify  the 
highly  (lowly)  expressed  region  in  brain  based  on  the  gene  expression  intensity  level.  As  seen, 
oligodendroeyte  speeifie  genes  seem  highly  expressed  around  eerebral  eortex  region  eomparing  to 
astroeytes  or  neuron  eell  type  speeifie  genes.  Likewise,  neuron-speeifie  genes  tend  to  be  more  highly 
expressed  in  the  thalamus  (TH)  and  hippoeampal  formation  (HPF)  than  other  eell  type  speeifie  genes. 
However,  there  ean  be  one  point  we  need  to  eateh  up.  Sinee  we  are  foeusing  on  the  gene  expression 
intensity  in  this  experiment,  we  eould  not  eonelude  that  sueh  a  high  expression  is  a  neeessary 
eonsequenee  of  eell  distribution.  Thus,  we  applied  same  test  at  the  density  level  of  original  spatial 
gene  expression  images  (Figure  44).  Density  level  test  helps  to  remove  the  possibility  that  sueh  a 
high  expression  patterns  eomes  from  relative  expression  differenee  in  a  partieular  region.  For  this 
test,  eaeh  expression  image  is  divided  into  100  by  100  patehes.  For  an  eaeh  pateh,  density  value  is 
ealeulated  representing  how  many  pixels  are  expressed.  Surprisingly,  Figure  43  and  Figure  44 
demonstrate  almost  same  results.  T-test  at  the  density  level  reveals  that  those  highly  expressed 
(dense)  region  is  enriehed  for  those  speeifie  eell  distribution.  For  example,  eerebral  eortex  (CTX)  is 
highly  enriehed  for  astroeytes  speeifie  gene  expressions,  whieh  suggest  that  astroeytes  eells  might  be 
highly  distributed  in  this  region.  In  the  same  analogy,  it  is  implieated  that  neuron  eells  are  densely 
distributed  in  the  thalamus  (TH)  and  hippoeampal  formation  (HPF). 
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Figure  43  :  Unpaired  t-test  between  different  cell  type  specific  gene  images  at  the  intensity  level.  Results  are 

thresholded  at  p  <  0.05. 


Coronal 
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Figure  44  :  Unpaired  t-test  between  different  cell  type  specific  gene  images  at  the  density  level.  Results  are 

thresholded  at  p  <  0.05. 


4.  Reportable  Outcomes 

Papers: 

Ko,  Y.,  Cabellero,  J.,  Glusman,  G.,  Hood,  L.,  and  Price,  N.D.,  Spatial  expression  patterning  in  cell-type 
specific  genes.  In  preparation  (2010) 

Invited  talks  this  year: 

NP:  Seminar,  Department  of  Bioengineering,  University  of  Illinois,  Urbana-Champaign,  IL,  September  2, 
2010 
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NP:  Invited  talk,  8*'^  International  Aegean  Conference  on  Pathways,  Networks,  and  Systems  Medicine, 
Rhodes,  Greece,  July  12,  2010 

NP:  Panelist  Speaker,  Personalized  Medicine  Symposium,  Research  Triangle  Park,  Durham,  NC,  June  15,  2010 

NP:  Translation  Biomedical  Research  Seminar,  University  of  Illinois,  Systems  approaches  to  disease  diagnosis 
and  prognosis,  April  5,  2010 

NP:  Seminar,  Genome  Institute  of  Singapore,  Jan.  21,  2010,  Systems  approaches  to  disease  stratification,  Jan 

21,  2010 

NP:  Seminar,  Department  of  Genetics,  Case  Western  Medical  School,  Systems  medicine  approaches  to 
disease  diagnosis  and  prognosis,  Dec.  9,  2009 

JC:  "Computer  prediction  of  blood  biomarkers  for  neurological  diseases"  presented  in  The  Allen  Institute  for 
Brain  Science  Data  Integration  Workshop,  March  16th- 17th,  2010. 

JC:  "Computer  prediction  of  blood  biomarkers  for  neurological  diseases"  presented  in  the  Amgen  Mini- 
Symposium,  March  26th,  2010. 


5.  Conclusion 

During  this  year,  we  applied  our  EigenBrain  approaeh  to  identify  eandidate  eell-type  speeifie  genes  in  the 
set  of  20,000  mouse  genes  represented  in  the  Allen  Brain  Atlas  dataset.  The  EigenBrain  approaeh 
identified  speeifie  regions  that  are  highly  expressed  in  eaeh  eell  type  and  will  provide  a  basis  for  further 
biologieal  insight  relating  eell-type-speeifie  expression  with  different  brain  regions.  We  investigated  these 
highly  expressed  patterns  in  brain  regions  for  eaeh  eell  type  -  and  further  biologieal  insights  and  these 
results  are  reported.  In  addition,  as  a  result,  we  diseovered  a  strong  eandidate  set  of  brain-speeifie  and 
eell-type  speeifie  transeripts.  Moreover,  we  applied  the  region-based  elustering  method  into  the  in  situ 
hybridization  of  eell  type  speeifie  genes.  Region-based  elustering  method  reveals  dramatie  spatial 
eonsisteney  of  neuron-speeifie  genes,  suffieient  to  reeapitulate  most  anatomieal  brain  regions  from  gene 
expression  alone.  Furthermore,  we  also  applied  unpaired  t-test  between  different  eell-type  speeifie  genes 
at  the  different  levels:  intensity  and  density.  This  result  helped  to  eharaeterize  the  highly  expressed  brain 
region  for  speeifie  eells  and  also  understand  eell  distribution  in  eonjunetion  with  a  density  feature.  We 
also  performed  the  analysis  of  brain  RNAseq  data  to  measure  transeripts  present  in  both  human  and 
mouse  brains.  We  have  already  deteeted  the  presenee  of  many  brain-speeifie  genes  (798  transeripts)  that 
improve  our  eandidate  seleetion. 
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